Ford GoBike System Data

by (Amr Saeed)

Preliminary Wrangling

This data set of information about individual rides made in a bike-sharing system from the greater San Francisco Bay area. this dataset consist of 183412 trip,db image taken @2019

Load in your dataset and describe its properties through the questions below. Try and motivate your exploration goals through this section.

What is the structure of your dataset?

the structure of data has 16 features and 183412 trip those are (duration_sec, start_time, end_time, start_station_id, start_station_name, start_station_latitude, start_station_longitude, end_station_id, end_station_name, end_station_latitude ,end_station_longitude, bike_id, user_type, member_birth_year, member_gender, bike_share_for_all_trip).

I had

1- drop null vlaues

2- scaling duration in seconds to duration in minutes

3- dropping [other] type of gender for simplecity

4- scaling [member_birth_year] feature to [member_age]

What is/are the main feature(s) of interest in your dataset?

duration of the trip is the main feature interest with thw aid of other features

What features in the dataset do you think will help support your investigation into your feature(s) of interest?

1- start time

2- member age

3- gender

i am intersted in

1- fitching out peak time of trips daily

2- the most frequent user ages

3- the destination takes the peak trip duration

Univariate Exploration

In this section, investigate distributions of individual variables. If you see unusual points or outliers, take a deeper look to clean things up and prepare yourself to look at relationships between variables.

The following figure shows the frequency distribution of flight times in minutes

as we see the first plot is not clear enough to fitch the most tripes duration distribution so I took the duration only to the range of 100 minute to provide better clearance

conclusion

the most trips are between 5 to 15 minutes (as the normal distribution centered) and shown above

plotting top 60 start trip distinations

the top distnation is ("market St") & ("san francisco Station 2")

I conclude that those two stations have the highest population so we need to provide more bikes

the most gender demanding for bikes

As shown in the figure Male are the most demanding users

plotting the start hour of the trip

The trip distribution over day hours peaks around two timeframes, 7am-9am and 4pm-6pm, during typical rush hours.

fitching the major users ages

most members were around 20 to 40 years old as the figure above shown

Make sure that, after every plot or related series of plots, that you include a Markdown cell with comments about what you observed, and what you plan on investigating next.

Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

my main intrest is the trip duration, since the most of trips were around 100 minutes so I focused under the range of <100 minute and for sure transformed the data scale to minute since the second unit is not clear enough.

the most of the trips were around 5:15 minutes

the rush our concluded by the frequency figures shows that the peak time were at the official work and study start-time & end-time

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

the start station were 329 and the most dense were at the top only so I focused on the top 60 the cocluded data shows that the most users are males As expected the users ages distribution shows the most were from 20:40 and this shows that workers are the major clients

Bivariate Exploration

In this section, investigate relationships between pairs of variables in your data. Make sure the variables that you cover here have been introduced in some fashion in the previous section (univariate exploration).

the most crowded station at rush hours

As shown at the pie plot the ("market st") is not the most crowded station at rush hours but the both ("san francisco caltrain and caltrain station 2") are the most crowded

with howard st at the third place

trip duration according to each gender

number male riders tend to have shorter trips compared to female users

which gender may allow to share his trip with other

the result shown is expected since male user tends to be more familiar with another user

average trip duration in week days

The riding trips are much shorter on Monday through Friday compared to weekends. It indicates a pretty stable and efficient usage of the sharing system on normal work days, while more casual flexible use on weekends.

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

> scale week days from start time
> the trip duration is varing between week-days and week-ends it is much longer in week ends tends to be due to jamming
> the female tends to take longer rides (expected result)

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

> Yes the most crowded stations changed according to rush hours since market st were placed top but it is san francisco caltrains
> the second conclusion that male user tends to share his trip

Multivariate Exploration

Create plots of three or more variables to investigate your data even further. Make sure that your investigations are justified, and follow from your work in the previous sections.

How does the trip duration distribution vary by age?

stands on the univariate resultes active users grouped at 20:45 years-old

concluded that:

> 1- the week days are the working and collage days
> 2- the day hours are working start & end hours of work
> 3- that workers and collage students are the top clients

how does the trip duration vary between age and gender of riders ?

> from previous plot we get that the male users are usimg more rides while comparing with female and other with long durations , other types of customers are taking long rides while they are older (50 - 60)

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

>The multivariate exploration confirmed the previous explorations and figures.

>The rides are mainly concentrated on rush hours Monday through Friday,

indicates that workers and collage students are the top clients,

>The longest rides are in weekends due to jamming.

the number of users for male is higher but percentage is higher for women in trip duration.

Were there any interesting or surprising interactions between features?

the top crowded station at rush hours varing from the the crowded station during the day and I conclude that people avoid crowds at peak times unless the trip is a working or study trip This conclusion is very useful Where the company will have to set a larger number of trips in specific places at rush times and different times in the rest of the day

At the end of your report, make sure that you export the notebook as an html file from the File > Download as... > HTML menu. Make sure you keep track of where the exported file goes, so you can put it in the same folder as this notebook for project submission. Also, make sure you remove all of the quote-formatted guide notes like this one before you finish your report!